efficient machine learning
LithOS: An Operating System for Efficient Machine Learning on GPUs
Coppock, Patrick H., Zhang, Brian, Solomon, Eliot H., Kypriotis, Vasilis, Yang, Leon, Sharma, Bikash, Schatzberg, Dan, Mowry, Todd C., Skarlatos, Dimitrios
The surging demand for GPUs in datacenters for machine learning (ML) has made efficient GPU utilization crucial. However, meeting the diverse needs of ML models while optimizing resource usage is challenging. To enable transparent, fine-grained GPU management that maximizes utilization and energy efficiency while maintaining strong isolation, an operating system (OS) approach is needed. This paper introduces LithOS, a first step toward a GPU OS. LithOS includes the following new abstractions and mechanisms for efficient GPU resource management: (i) a novel TPC Scheduler that supports spatial scheduling at the granularity of individual TPCs, unlocking efficient TPC stealing between workloads; (ii) transparent kernel atomization to reduce head-of-line blocking and enable dynamic resource reallocation mid-execution; (iii) a lightweight hardware right-sizing mechanism that determines the minimal TPC resources needed per atom; and (iv) a transparent power management mechanism that reduces power consumption based on in-flight work behavior. We implement LithOS in Rust and evaluate its performance across extensive ML environments, comparing it to state-of-the-art solutions from NVIDIA and prior research. For inference stacking, LithOS reduces tail latencies by 13x compared to MPS; compared to the best SotA, it reduces tail latencies by 3x while improving aggregate throughput by 1.6x. In hybrid inference-training stacking, LithOS reduces tail latencies by 4.7x compared to MPS; compared to the best SotA, it reduces tail latencies 1.18x while improving aggregate throughput by 1.35x. Finally, for a modest performance hit under 4%, LithOS's right-sizing provides a quarter of GPU capacity savings on average, while for a 7% hit, its power management yields a quarter of a GPU's energy savings. Overall, LithOS increases GPU efficiency, establishing a foundation for future OS research on GPUs.
CapyMOA: Efficient Machine Learning for Data Streams in Python
Gomes, Heitor Murilo, Lee, Anton, Gunasekara, Nuwan, Sun, Yibin, Cassales, Guilherme Weigert, Liu, Justin, Heyden, Marco, Cerqueira, Vitor, Bahri, Maroua, Koh, Yun Sing, Pfahringer, Bernhard, Bifet, Albert
CapyMOA is an open-source library designed for efficient machine learning on streaming data. It provides a structured framework for real-time learning and evaluation, featuring a flexible data representation. CapyMOA includes an extensible architecture that allows integration with external frameworks such as MOA and PyTorch, facilitating hybrid learning approaches that combine traditional online algorithms with deep learning techniques. By emphasizing adaptability, scalability, and usability, CapyMOA allows researchers and practitioners to tackle dynamic learning challenges across various domains.
Efficient Machine Learning in 40 minutes and 2 PHP scripts - timeNough
Artificial Intelligence (AI) and Machine Learning (ML) are big topics these days, no matter what the domain is โ finance, fintech, politics, health, education, science, blockchain, and so on. Would it still be possible to catch up if you missed the start and had only some basic knowledge in PHP or Javascript? It has even been used by some startups in the past five years as an opportunity to show off, so as to make their products and services more valuable to customers and investors. ML and AI can change everything if they are integrated into the value proposition. However, it is not something that should be taken lightly, ML is become more difficult to access as it requires specialists in AI and ML, and people who have studied the field, or who have dedicated time to be trained and certified on that subject, making it in certain cases harder for the general public to fully understand. There will be no mention of formulas, operations, series of numbers, or variances in this blog post. Since I am not a math nerd, I did not take the time to fully understand the deep skeleton of Machine Learning programs prior to this article, the probabilities, the calculations, the components, etc.
Compilation and Optimizations for Efficient Machine Learning on Embedded Systems
Zhang, Xiaofan, Chen, Yao, Hao, Cong, Huang, Sitao, Li, Yuhong, Chen, Deming
Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However, DNN-based ML applications also bring much increased computational and storage requirements, which are particularly challenging for embedded systems with limited compute/storage resources, tight power budgets, and small form factors. Challenges also come from the diverse application-specific requirements, including real-time responses, high-throughput performance, and reliable inference accuracy. To address these challenges, we introduce a series of effective design methodologies, including efficient ML model designs, customized hardware accelerator designs, and hardware/software co-design strategies to enable efficient ML applications on embedded systems.
Moving from Red AI to Green AI: A Practitioner's Guide to Efficient Machine Learning
In our previous post, we talked about how red AI means adding computational power to "buy" more accurate models in machine learning, and especially in deep learning. We also talked about the increased interest in green AI, in which we not only measure the quality of a model based on accuracy but also how big and complex it is. We covered different ways of measuring model efficiency and showed ways to visualize this and select models based on it. Maybe you also attended the webinar? If not, take a look at the recording where we also cover a few of the points we'll describe in this blog post.
Efficient Machine Learning
If you're a machine learning specialist looking to make the transaction into the real-world AI applications. This comprehensive course will be your guide to learning how to scale-up your machine learning model to the optimal state possible, you'll be learning everything you need to move you machine learning model to the next stage. This course is designed for both beginners with some programming experience or experienced developers looking to make the jump to Data Science! You'll learn the machine learning, AI, and data mining techniques real employers are looking for, including:
Four Popular Feature Selection Methods for Efficient Machine Learning in Python
Feature selection is one of the most important parts of machine learning. In most datasets in the real world, there might be many features. But not all the features are necessary for a certain machine learning algorithm. Using too much unnecessary features may cause a lot of problems. The first one is definitely the computation cost. The unnecessarily big dataset will take an unnecessarily long time to run the algorithm.
Efficient Machine Learning
NEW, 4.1 (12 ratings), Created by Usama Albaghdady, English If you're a machine learning specialist looking to make the transaction into the real-world AI applications. This comprehensive course will be your guide to learning how to scale-up your machine learning model to the optimal state possible, you'll be learning everything you need to move you machine learning model to the next stage. This course is designed for both beginners with some programming experience or experienced developers looking to make the jump to Data Science! You'll learn the machine learning, AI, and data mining techniques real employers are looking for, including:
Efficient Machine Learning in H2O with R and Python, Part 1 - DATAVERSITY
One of the major benefits of working with R and Python for analytics is that there're always new and freely-available treats from their vibrant open source ecosystems. And now more and more, data scientists are able to reap the benefits of working with data in R, Python and other platforms simultaneously, as vendors introduce performant products with APIs to both R and Python -- in addition to perhaps Java, Scala and Spark. An example with which I'm currently quite smitten is H2O. H2O brands as "AI for Business" that "makes it possible for anyone to easily apply math and predictive analytics to solve today's most challenging business problems." What sets H2O apart is its comprehensive, open source, cross-platform, machine learning infrastructure architected from the ground up for scalability and speed.
Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets
This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.